Interstitial Lung Disease (ILD) segmentation labels are highly costly, leading to small sample sizes in existing datasets and resulting in poor performance of trained models. To address this issue, a segmentation algorithm for ILD based on multi-task learning was proposed. Firstly, a multi-task segmentation model was constructed based on U-Net. Then, the generated lung segmentation labels were used as auxiliary task labels for multi-task learning. Finally, a method of dynamically weighting the multi-task loss functions was used to balance the losses of the primary task and the secondary task. Experimental results on a self-built ILD dataset show that the Dice Similarity Coefficient (DSC) of the multi-task segmentation model reaches 82.61%, which is 2.26 percentage points higher than that of U-Net. The experimental results demonstrate that the proposed algorithm can improve the segmentation performance of ILD and can assist clinical doctors in ILD diagnosis.
Aiming at the problem that the classification and localization sub-tasks in object detection require large receptive field and high resolution respectively, and it is difficult to achieve a balance between these two contradictory requirements, a feature pyramid network algorithm based on attention mechanism for object detection was proposed. In the algorithm, multiple different receptive fields were integrated to obtain richer semantic information, multi-scale feature maps were fused in the way of paying more attention to the importance of different feature maps, and the fused feature maps were further refined under the guidance of the attention mechanism. Firstly, multi-scale receptive fields were obtained through multiple atrous convolutions with different dilation rates, which enhanced the semantic information with the preservation of the resolution. Secondly, through the Multi-Level Fusion (MLF), multiple feature maps of different scales were fused after changing to the same resolution through upsampling or pooling operations. Finally, the proposed Attention-guided Feature Refinement Module (AFRM) was used to refine the fused feature maps to enhance semantic information and eliminate the aliasing effect caused by fusion. After replacing the Feature Pyramid Network (FPN) in Faster R-CNN with the proposed feature pyramid, experiments were performed on MS COCO 2017 dataset. The results show that when the backbone network is ResNet (Residual Network) with a depth of 50 and 101, with the use of the proposed algorithm, the Average Precision (AP) of the model reaches 39.2% and 41.0% respectively, which is 1.4 and 1.0 percentage points higher than that of Faster R-CNN using the original FPN, respectively. It can be seen that the proposed feature pyramid network algorithm can replace the original feature pyramid to be better applied in the object detection scenarios.
The frequency distribution of International Classification of Diseases (ICD) codes is long tail, resulting in it is challenging to perform multi-label text classification for few-shot code. An MNIC (Meta Network-based automatic ICD Coding model) was proposed to solve the problem of insufficient training data in few-shot code classification. Firstly, instances in the feature space and features in the semantic space were fitted to the same space for mapping, and the feature representations of many-shot codes were mapped to their classifier weights, thus learning meta-knowledge through meta-network. Secondly, the learned meta-knowledge was transferred from data-abundant many-shot codes to data-poor few-shot codes. Finally, a reasonable explanation was provided for the transferability and generality of meta-knowledge. Experimental results on MIMIC-Ⅲ dataset show that MNIC improves the Micro-F1 and Micro Area Under Curve (Micro-AUC) of few-shot codes by 3.77 and 3.82 percentage points respectively compared to the suboptimal AGM-HT (Adversarial Generative Model conditioned on code descriptions with Hierarchical Tree structure) model, indicating that the proposed model improves the performance of few-shot code classification significantly.
Aiming at the limitations of existing long non-coding RNA (lncRNA) -disease association prediction models in comprehensively utilizing interaction and semantic information of heterogeneous biological networks, an lncRNA-Disease Association prediction model based on Semantic and Global dual Attention mechanism (SGALDA) was proposed. Firstly, an lncRNA-disease-microRNA (miRNA) heterogeneous network was constructed based on similarity and known associations. And a feature extraction module was designed based on message passing types to extract and fuse the neighborhood features of homogeneous and heterogeneous nodes on the network, so as to capture multi-level interactive relations on the heterogeneous network. Secondly, the heterogeneous network was decomposed into multiple semantic sub-networks based on meta-paths. And a Graph Convolutional Network (GCN) was applied on each sub-network to extract semantic features of nodes, so as to capture the high-order interactive relations on the heterogeneous network. Thirdly, a semantic and global dual attention mechanism was used to fuse semantic and neighborhood features of the nodes to obtain more representative node features. Finally, lncRNA-disease associations were reconstructed by using the inner product of lncRNA node features and disease node features. The 5-fold cross-validation results show that the Area Under Receiver Operating Characteristic curve (AUROC) of SGALDA is 0.994 5±0.000 2, and the Area Under Precision-Recall curve (AUPR) of SGALDA is 0.916 7±0.001 1, both of them are the highest among AUROCs sand AUPRs of all the comparison models. It proves SGALDA’s good prediction performance. Case studies on breast cancer and stomach cancer further prove the ability of SGALDA to identify potential lncRNA-disease associations, indicating that SGALDA has the potential to be a reliable lncRNA-disease association prediction model.
Most existing computational models for predicting associations between circular RNA (circRNA) and diseases usually use biological knowledge such as circRNA and disease-related data, and mine the potential association information by combining known circRNA-disease association information pairs. However, these models suffer from inherent problems such as sparsity and too few negative samples of networks composed of the known association, resulting in poor prediction performance. Therefore, inductive matrix completion and self-attention mechanism were introduced for two-stage fusion based on graph auto-encoder to achieve circRNA-disease association prediction, and the model based on the above is GIS-CDA (Graph auto-encoder combining Inductive matrix complementation and Self-attention mechanism for predicting CircRNA-Disease Association). Firstly, the similarity of circRNA integration and disease integration was calculated, and graph auto-encoder was used to learn the potential features of circRNAs and diseases to obtain low-dimensional representations. Secondly, the learned features were input to inductive matrix complementation to improve the similarity and dependence between nodes. Thirdly, the circRNA feature matrix and disease feature matrix were integrated into circRNA-disease feature matrix to enhance the stability and accuracy of prediction. Finally, a self-attention mechanism was introduced to extract important features in the feature matrix and reduce the dependence on other biological information. The results of five-fold crossover and ten-fold crossover validation show that the Area Under Receiver Operating Characteristic curve (AUROC) values of GIS-CDA are 0.930 3 and 0.939 3 respectively, the former of which is 13.19,35.73,13.28 and 5.01 percentage points higher than those of the prediction models based on computational model of KATZ measures for Human CircRNA-Disease Association (KATZHCDA), Deep Matrix Factorization for CircRNA-Disease Association (DMFCDA), RWR (Random Walk with Restart) and Speedup Inductive Matrix Completion for CircRNA-Disease Associations (SIMCCDA), respectively; the Area Under Precision-Recall curve (AUPR) values of GIS-CDA are 0.227 1 and 0.234 0 respectively, the former of which is 21.72, 22.43, 21.96 and 13.86 percentage points higher than those of the above comparison models respectively. In addition, ablation experiments and case studies on circRNADisease, circ2Disease and circR2Disease datasets, further validate the good performance of GIS-CDA in predicting the potential circRNA-disease association.
Aiming at the problem of inaccurate prediction of edges and the farthest region in monocular image depth estimation, a monocular depth estimation method based on Pyramid Split attention Network (PS-Net) was proposed. Firstly, based on Boundary-induced and Scene-aggregated Network (BS-Net), Pyramid Split Attention (PSA) module was introduced in PS-Net to process the spatial information of multi-scale features and effectively establish the long-term dependence between multi-scale channel attentions, thereby extracting the boundary with sharp change depth gradient and the farthest region. Then, the Mish function was used as the activation function in the decoder to further improve the performance of the network. Finally, training and evaluation were performed on NYUD v2 (New York University Depth dataset v2) and iBims-1 (independent Benchmark images and matched scans v1) datasets. Experimental results on iBims-1 dataset show that the proposed network reduced 1.42 percentage points compared with BS-Net in measuring Directed Depth Error (DDE), and has the proportion of correctly predicted depth pixels reached 81.69%. The above proves that the proposed network has high accuracy in depth prediction.
Aiming at the problems that the end-to-end recognition of two-stream networks cannot be realized due to the need of calculating optical flow maps in advance to extract motion information and the three-dimensional convolutional networks have a lot of parameters, an action recognition method based on video spatio-temporal features was proposed. In this method, the spatio-temporal information in videos were able to be extracted efficiently without adding any calculation of optical flows or any three-dimensional convolution operation. Firstly, the motion information extraction module based on attention mechanism was used to capture the motion shift information between two adjacent frames, thereby simulating the function of optical flows in two-stream network. Secondly, a decoupled spatio-temporal information extraction module was proposed to replace the three-dimensional convolution in order to encode the spatio-temporal information. Finally, the two modules were embedded into the two-dimensional residual network to complete the end-to-end action recognition. Experiments were carried out on several mainstream action recognition datasets. The results show that when only using RGB (Red-Green-Blue) video frames as input, the recognition accuracies of the proposed method on UCF101, HMDB51 and Something-Something-V1 datasets are 96.5%, 73.1% and 46.6% respectively. Compared with Temporal Segment Network (TSN) method using two-stream structure, the proposed method has the recognition accuracy on UCF101 improved by 2.5 percentage points. It can be seen that the proposed method is able to extract spatio-temporal features in videos efficiently.
Focusing on the target tracking problem in the docking stage of autonomous aerial refueling, a joint detection and tracking algorithm of target in aerial refueling scenes was proposed. In the algorithm, CenterTrack network with integrated detection and tracking was adopted to track the drogue. In view of the large computational cost and long training time, this network was improved from two aspects: model design and network optimization. Firstly, dilated convolution group was introduced into the tracker to make the network weight lighter without changing the size of the receptive field. At the same time, the convolutional layer of the output part was replaced with depthwise separable convolutional layer to reduce the network parameters and computational cost. Then, the network was further optimized to make it converge to a stable state faster by combining Stochastic Gradient Descent (SGD) method with Adaptive moment estimation (Adam) algorithm. Finally, videos of real-world aerial refueling scenes and simulations on the ground were made into dataset with the corresponding format for experimental verification. The training and testing were carried out on the self-built drogue dataset and MOT17 (Multiple Object Tracking 17) public dataset respectively, and the effectiveness of the proposed algorithm was verified. Compared to the original CenterTrack network, the improved network Tiny-CenterTrack reduces training time by about 48.6% and improves the real-time performance by 8.8%. Experimental results show that the improved network can effectively save the computing resources and improve the real-time performance to a certain extent without the loss of network performance.
With the deepening of the network in the Medical Named Entity Recognition (MNER) problem, the recognition accuracy and computing power requirements of the deep learning-based recognition models are unbalanced. Aiming at this problem, a medical named entity recognition model CasSAttMNER (Cascade Self-Attention Medical Named Entity Recognition) based on deep auto-encoding was proposed. Firstly, a depth difference balance strategy between encoding and decoding was used in the model, and the distilled Transformer language model RBT6 was used as the encoder to reduce the encoding depth and the computing power requirements for training and application. Then, Bidirectional Long Short-Term Memory (BiLSTM) network and Conditional Random Field (CRF) were used to propose a cascaded multi-task dual decoder to complete entity mention sequence labeling and entity class determination. Finally, based on the self-attention mechanism, the model design was optimized by effectively representing the implicit decoding information between the entity classes and the entity mentions. Experimental results show that the F value measurements of CasSAttMNER on two Chinese medical entity datasets can reach 0.943 9 and 0.945 7, which are 3 percentage points and 8 percentage points higher than those of the baseline model, respectively, verifying that this model further improves the decoder performance.
It is an important task to carry out reasoning and calculation on graph structure data. The main challenge of this task is how to represent graph-structured knowledge so that machines can easily understand and use graph structure data. After comparing the existing representation learning models, it is found that the models based on random walk methods are likely to ignore the special effect of attributes on the association between nodes. Therefore, a hybrid random walk method based on node adjacency and attribute association was proposed. Firstly the attribute weights were calculated through the common attribute distribution among adjacent nodes, and the sampling probability from the node to each attribute was obtained. Then, the network information was extracted from adjacent nodes and non-adjacent nodes with common attributes respectively. Finally, the network representation learning model based on node attribute bipartite graph was constructed, and the node vector representations were obtained through the above sampling sequence learning. Experimental results on Flickr, BlogCatalog and Cora public datasets show that the Micro-F1 average accuracy of node classification by the node vector representations obtained by the proposed model is 89.38%, which is 2.02 percentage points higher than that of GraphRNA (Graph Recurrent Networks with Attributed random walk) and 21.12 percentage points higher than that of classical work DeepWalk. At the same time, by comparing different random walk methods, it is found that increasing the sampling probabilities of attributes that promote node association can improve the information contained in the sampling sequence.
In Named Entity Recognition (NER) of elementary mathematics, aiming at the problems that the word embedding of the traditional NER method cannot represent the polysemy of a word and some local features are ignored in the feature extraction process of the method, a Bidirectional Encoder Representation from Transformers (BERT) based NER method for elementary mathematical text named BERT-BiLSTM-IDCNN-CRF (BERT-Bidirectional Long Short-Term Memory-Iterated Dilated Convolutional Neural Network-Conditional Random Field) was proposed. Firstly, BERT was used for pre-training. Then, the word vectors obtained by training were input into BiLSTM and IDCNN to extract features, after that, the output features of the two neural networks were merged. Finally, the output was obtained through the correction of CRF. Experimental results show that the F1 score of BERT-BiLSTM-IDCNN-CRF is 93.91% on the dataset of test questions of elementary mathematics, which is 4.29 percentage points higher than that of BiLSTM-CRF benchmark model, and 1.23 percentage points higher than that of BERT-BiLSTM-CRF model. And the F1 scores of the proposed method to line, angle, plane, sequence and other entities are all higher than 91%, which verifies the effectiveness of the proposed method on elementary mathematical entity recognition. In addition, after adding attention mechanism to the proposed model, the recall of the model decreases by 0.67 percentage points, but the accuracy of the model increases by 0.75 percentage points, which means the introduction of attention mechanism has little effect on the recognition effect of the proposed method.
Focused on the issue that the most existing social recommendation algorithms ignore the influence of the association relationship between items on the recommendation accuracy, and fail to effectively combine user ratings with trust data, a Social recommendation algorithm combing Trust implicit similarity and Score similarity (SocialTS) was proposed. Firstly, the score similarity and trust implicit similarity between users were combined linearly to obtain reliable similar friends among users. Then, the trust relationship was integrated into the correlation analysis of items, and the modified similar items were obtained. Finally, similar users and items were added to the Matrix Factorization (MF) model as regularization terms, thereby obtaining more accurate feature representations of users and items. Experimental results show that on FilmTrust and CiaoDVD datasets, when the latent feature dimension is 10, compared with the mainstream social recommendation algorithm Trust-based Singular Value Decomposition (TrustSVD), SocialTS has the Root Mean Square Error (RMSE) reduced by 4.23% and 8.38% respectively, and the Mean Absolute Error (MAE) reduced by 4.66% and 6.88% respectively. SocialTS can not only effectively improve users' cold start problem, but also accurately predict users' actual ratings under different numbers of ratings, and has good robustness.
Classifying similar, counterfeit and deteriorated slices in Chinese herbal slices plays a vital role in clinical application of Chinese medicine. Traditional manual identification methods are subjective and fallible. And the classification of traditional Chinese herbal slices based on computer vision is superior in speed and accuracy, which makes Chinese herbal slice screening intelligent. Firstly, general steps of Chinese medicine recognition algorithm based on computer vision were introduced, and technical development status of preprocessing, feature extraction and recognition model of Chinese medicine images were reviewed separately. Then, 12 classes of similar and easily confused Chinese herbal slices were selected as a case to study. By constructing a dataset with 9 156 pictures of Chinese herbal slices, the recognition performance differences of traditional recognition algorithms and various deep learning models were analyzed and compared. Finally, the difficulties and future development trends of computer vision in Chinese herbal slices were summarized and prospected.
In Positron Emission Tomography (PET) computed imaging, traditional iterative algorithms have the problem of details loss and fuzzy object edges. A high quality Median Prior (MP) reconstruction algorithm based on correlation coefficient and Forward-And-Backward (FAB) diffusion was proposed to solve the problem in this paper. Firstly, a characteristic factor called correlation coefficient was introduced to represent the image local gray information. Then through combining the correlation coefficient and forward-and-backward diffusion model, a new model was made up. Secondly, considering that the forward-and-backward diffusion model has the advantages of dealing with background and edge separately, the proposed model was applied to Maximum A Posterior (MAP) reconstruction algorithm of the median prior distribution, thus a median prior reconstruction algorithm based on forward-and-backward diffusion was obtained. The simulation results show that, the new algorithm can remove the image noise while preserving object edges well. The Signal-to-Noise Ratio (SNR) and Root Mean Squared Error (RMSE) also show visually the improvement of the reconstructed image quality.
Different complete coverage algorithms for single mobile robot were classified into three kinds. They are potential filed grid based, cellular decomposition based, and transformation between local transform and global transform based approaches. Their performances were analyzed and the advantages and disadvantages were pointed out, then the improved methods were discussed and analyzed. In addition, the complete coverage path planning algorithms for multiple mobile robots based on combination of single robot path planning algorithm and task allocation were investigated. Finally, the further research direction of complete coverage algorithm for mobile robots was discussed. The analytical results show that the full use of complementary advantages of the current algorithms, or taking multi-disciplinary advantages may provide whole algorithm of mobile robot research with more efficient algorithm.